CS – 2002 – 03 A Large , Fast Instruction Window for Tolerating Cache
نویسندگان
چکیده
Instruction window size is an important design parameter for many modern processors. Large instruction windows offer the potential advantage of exposing large amounts of instruction level parallelism. Unfortunately, naively scaling conventional window designs can significantly degrade clock cycle time, undermining the benefits of increased parallelism. This paper presents a new instruction window design targeted at achieving the latency tolerance of large windows with the clock cycle time of small windows. The key observation is that instructions dependent on a long latency operation (e.g., cache miss) cannot execute until that source operation completes. These instructions are moved out of the conventional, small, issue queue to a much larger waiting instruction buffer (WIB). When the long latency operation completes, the instructions are reinserted into the issue queue. In this paper, we focus specifically on load cache misses and their dependent instructions. Simulations reveal that, for an 8-way processor, a 2K-entry WIB with a 32-entry issue queue can achieve speedups of 20%, 84%, and 50% over a conventional 32-entry issue queue for a subset of the SPEC CINT2000, SPEC CFP2000, and Olden benchmarks, respectively.
منابع مشابه
Checkpoint Processing and Recovery: An Efficient, Scalable Alternative to Reorder Buffers
0272-1732/03/$17.00 2003 IEEE Published by the IEEE computer Society Achieving high performance in modern microprocessors requires a combination of exposing large amounts of instruction level parallelism (ILP) and processing instructions at a high clock frequency. Exposing maximum ILP requires the processor to operate concurrently on large numbers of instructions, also known as the instructio...
متن کاملPractical Precise Evaluation of Cache Effects on Low Level Embedded Vliw Computing
The introduction of caches inside high performance processors provides technical ways to reduce the memory gap by tolerating longmemory access delays. While such intermediate fast caches accelerate program execution in general, they have a negative impact on the predictability of program performances. This lack of performance stability is a non-desirable characteristic for embedded computing. W...
متن کاملMicroarchitecture for Billion-Transistor VLSI Superscalar Processors
Microarchitecture for Billion-Transistor VLSI Superscalar Processors Gabriel Hsiuwei Loh 2002 The vast computational resources in billion-transistor VLSI microchips can continue to be used to build aggressively clocked uniprocessors for extracting large amounts of instruction level parallelism. This dissertation addresses the problems of implementing wide issue, out-of-order execution, supersca...
متن کاملEffects of Multithreading on Cache Performance
ÐAs the performance gap between processor and memory grows, memory latency becomes a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency by exploiting thread-level parallelism. The question, however, remains as to how effective multithreading is on tolerating memory latency. The...
متن کاملScaling Instruction Window
Contemporary superscalar processors employ large instruction window to tolerate long latency (mainly second-level cache misses) and explore more instruction level parallelism (ILP); on the one hand, a larger instruction window can buffer larger number of instructions and find more independent instructions to execute, on the other hand, simply scaling instruction window as a unified and single u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002